Weakly Supervised Action Localization by Sparse Temporal Pooling Network
نویسندگان
چکیده
We propose a weakly supervised temporal action localization algorithm on untrimmed videos using convolutional neural networks. Our algorithm learns from video-level class labels and predicts temporal intervals of human actions with no requirement of temporal localization annotations. We design our network to identify a sparse subset of key segments associated with target actions in a video using an attention module and fuse the key segments through adaptive temporal pooling. Our loss function is comprised of two terms that minimize the video-level action classification error and enforce the sparsity of the segment selection. At inference time, we extract and score temporal proposals using temporal class activations and class-agnostic attentions to estimate the time intervals that correspond to target actions. The proposed algorithm attains state-of-the-art results on the THUMOS14 dataset and outstanding performance on ActivityNet1.3 even with its weak supervision.
منابع مشابه
Object-Extent Pooling for Weakly Supervised Single-Shot Localization
In the face of scarcity in detailed training annotations, the ability to perform object localization tasks in real-time with weak-supervision is very valuable. However, the computational cost of generating and evaluating region proposals is heavy. We adapt the concept of Class Activation Maps (CAM) [28] into the very first weakly-supervised ‘single-shot’ detector that does not require the use o...
متن کاملTowards Weakly-Supervised Action Localization
This paper presents a novel approach for weakly-supervised action localization, i.e., that does not require per-frame spatial annotations for training. We first introduce an effective method for extracting human tubes by combining a state-of-the-art human detector with a tracking-by-detection approach. Our tube extraction leverages the large amount of annotated humans available today and outper...
متن کاملWeakly Supervised Object Detection with Pointwise Mutual Information
In this work a novel approach for weakly supervised object detection that incorporates pointwise mutual information is presented. A fully convolutional neural network architecture is applied in which the network learns one filter per object class. The resulting feature map indicates the location of objects in an image, yielding an intuitive representation of a class activation map. While tradit...
متن کاملMotion in action : optical flow estimation and action localization in videos. (Le mouvement en action : estimation du flot optique et localisation d'actions dans les vidéos)
With the recent overwhelming growth of digital video content, automatic video understanding has become an increasingly important issue. This thesis introduces several contributions on two automatic video understanding tasks: optical ow estimation and human action localization. Optical ow estimation consists in computing the displacement of every pixel in a video and faces several challenges inc...
متن کاملWeakly Supervised Semantic Segmentation Using Superpixel Pooling Network
We propose a weakly supervised semantic segmentation algorithm based on deep neural networks, which relies on imagelevel class labels only. The proposed algorithm alternates between generating segmentation annotations and learning a semantic segmentation network using the generated annotations. A key determinant of success in this framework is the capability to construct reliable initial annota...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1712.05080 شماره
صفحات -
تاریخ انتشار 2017